Getting Started
To get started, simply callweave.init(project=...)
at the beginning of your script. Use the project
argument to log to a specific W&B Team name with team-name/project-name
or do project-name
to log to your default team/entity.
Tracking Call Metadata
To track metadata from your Verdict pipeline calls, you can use theweave.attributes
context manager. This context manager allows you to set custom metadata for a specific block of code, such as a pipeline run or evaluation batch.
Traces
Storing traces of AI evaluation pipelines in a central database is crucial during both development and production. These traces are essential for debugging and improving your evaluation workflows by providing a valuable dataset. Weave automatically captures traces for your Verdict applications. It will track and log all calls made through the Verdict library, including:- Pipeline execution steps
- Judge unit evaluations
- Layer transformations
- Pooling operations
- Custom units and transformations
Pipeline Tracing Example
Here’s a more complex example showing how Weave traces nested pipeline operations:- The main Pipeline execution
- Each JudgeUnit evaluation within the Layer
- The MeanPoolUnit aggregation step
- Timing information for each operation
Configuration
Upon callingweave.init()
, tracing is automatically enabled for Verdict pipelines. The integration works by patching the Pipeline.__init__
method to inject a VerdictTracer
that forwards all trace data to Weave.
No additional configuration is needed - Weave will automatically:
- Capture all pipeline operations
- Track execution timing
- Log inputs and outputs
- Maintain trace hierarchy
- Handle concurrent pipeline execution
Custom Tracers and Weave
If you’re using custom Verdict tracers in your application, Weave’sVerdictTracer
can work alongside them:
Models and Evaluations
Organizing and evaluating AI systems with multiple pipeline components can be challenging. Using theweave.Model
, you can capture and organize experimental details like prompts, pipeline configurations, and evaluation parameters, making it easier to compare different iterations.
The following example demonstrates wrapping a Verdict pipeline in a WeaveModel
:
Evaluations
Evaluations help you measure the performance of your evaluation pipelines themselves. By using theweave.Evaluation
class, you can capture how well your Verdict pipelines perform on specific tasks or datasets: